10 research outputs found

    Testing Data Transformations in MapReduce Programs

    Get PDF
    MapReduce is a parallel data processing paradigm oriented to process large volumes of information in data-intensive applications, such as Big Data environments. A characteristic of these applications is that they can have different data sources and data formats. For these reasons, the inputs could contain some poor quality data that could produce a failure if the program functionality does not handle properly the variety of input data. The output of these programs is obtained from a number of input transformations that represent the program logic. This paper proposes the testing technique called MRFlow that is based on data flow test criteria and oriented to transformations analysis between the input and the output in order to detect defects in MapReduce programs. MRFlow is applied over some MapReduce programs and detects several defect

    RETORCH: Resource-Aware End-to-End Test Orchestration

    Get PDF
    12th International Conference, QUATIC 2019, Ciudad Real, Spain, September 11–13, 2019Continuous integration practices introduce incremental changes in the code to both improve the quality and add new functionality. These changes can introduce faults that can be timely detected through continuous testing by automating the test cases and re-executing them at each code change. However, re-executing all test cases at each change may not be always feasible, especially for those test cases that make heavy use of resources thoroughly like End-to-End test cases that need a complex test infrastructure. This paper is focused on optimizing the usage of the resources employed during End-to-End testing (e.g., storage, memory, web servers or tables of a database, among others) through a resource-aware test orchestration technique in the context of continuous integration in the cloud. In order to optimize both the cost/usage of resources and the execution time, the approach proposes to (i) identify the resources required by the End-to-End test cases, (ii) group together those tests that need the same resources, (iii) deploy the tests in both dependency isolated and elastic environments, and (iv) schedule their parallel execution in several machine

    Infrastructure-Aware Functional Testing of MapReduce Programs

    Get PDF
    2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), Vienna, 2016Programs that process a large volume of data generally run in a distributed and parallel architecture, such as the programs implemented in the processing model MapReduce. In these programs, developers can abstract the infrastructure where the program will run and focus on the functional issues. However, the infrastructure configuration and its state cause different parallel executions of the program and some could derive in functional faults which are hard to reveal. In general, the infrastructure that executes the program is not considered during the testing, because the tests usually contain few input data and then the parallelization is not necessary. In this paper a testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults. This testing technique is automatized by using a test engine and is applied to a case study. As a result, several infrastructure configurations are automatically generated and executed for a test case revealing a functional fault that is then fixed by the develope

    Towards Ex Vivo Testing of MapReduce Applications

    Get PDF
    2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), 25-29 July 2017, Prague (Czech Republic)Big Data programs are those that process large data exceeding the capabilities of traditional technologies. Among newly proposed processing models, MapReduce stands out as it allows the analysis of schema-less data in large distributed environments with frequent infrastructure failures. Functional faults in MapReduce are hard to detect in a testing/preproduction environment due to its distributed characteristics. We propose an automatic test framework implementing a novel testing approach called Ex Vivo. The framework employs data from production but executes the tests in a laboratory to avoid side-effects on the application. Faults are detected automatically without human intervention by checking if the same data would generate different outputs with different infrastructure configurations. The framework (MrExist) is validated with a real-world program. MrExist can identify a fault in a few seconds, then the program can be stopped, not only avoiding an incorrect output, but also saving money, time and energy of production resource

    Functional testing techniques for new massive data processing paradigms

    No full text
    Los programas Big Data son aquellos que analizan información utilizando nuevos modelos de procesamiento que superan las limitaciones de la tecnología tradicional en cuanto al volumen, velocidad y variedad de los datos procesados. Entre estos, se destaca MapReduce que permite procesar grandes cantidades de datos en una infraestructura distribuida que puede cambiar durante la ejecución debido a los frecuentes fallos en la infraestructura y las optimizaciones. El desarrollador sólo diseña el programa, mientras que la ejecución de su funcionalidad es gestionada por un sistema distribuido, tales como asignación de recursos y el mecanismo de tolerancia a fallos, entre otros. Como consecuencia, un programa puede comportarse diferente en cada ejecución porque se adapta automáticamente a los recursos que estén disponibles en cada momento. Esta ejecución no determinista dificulta las pruebas del software y la depuración, especialmente para aquellos programas MapReduce con un diseño complejo. A pesar de que tanto el rendimiento y la funcionalidad son importantes, la mayoría de investigación sobre la calidad de los programas MapReduce se centra en rendimiento. Por el contrario, hay pocos estudios sobre funcionalidad a pesar de que varias aplicaciones MapReduce fallan con regularidad debido a defectos funcionales. Probar y depurar estos defectos es importante, especialmente cuando los programas MapReduce realizan tareas críticas.= Big Data programs are those that analyse the information using new processing models to overcome the limitations of the traditional technology due the volume, velocity or variety of the data. Among them, MapReduce stands out by allowing for the processing of large data over a distributed infrastructure that can change during runtime due the frequent infrastructure failures and optimizations. The developer only designs the program, whereas the execution of its functionality is managed by a distributed system, such as the allocation of the resources and the fault tolerance mechanism, among others. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. This non-deterministic execution makes both software testing and debugging difficult, specially for those MapReduce programs with complex design. Despite both performance and functionality are important, the majority of the research about the quality of the MapReduce programs are focused on performance. In contrast, few research studies are about functionality although several MapReduce applications fail regularly due a functional fault. Testing and debugging these faults are important, specially when the MapReduce programs perform a critical task

    Test-Driven Anonymization for Artificial Intelligence

    No full text
    In recent years, data published and shared with third parties to develop artificial intelligence (AI) tools and services has significantly increased. When there are regulatory or internal requirements regarding privacy of data, anonymization techniques are used to maintain privacy by transforming the data. The side-effect is that the anonymization may lead to useless data to train and test the AI because it is highly dependent on the quality of the data. To overcome this problem, we propose a test-driven anonymization approach for artificial intelligence tools. The approach tests different anonymization efforts to achieve a trade-off in terms of privacy (non-functional quality) and functional suitability of the artificial intelligence technique (functional quality). The approach has been validated by means of two real-life datasets in the domains of healthcare and health insurance. Each of these datasets is anonymized with several privacy protections and then used to train classification AIs. The results show how we can anonymize the data to achieve an adequate functional suitability in the AI context while maintaining the privacy of the anonymized data as high as possible

    Debugging flaky tests on web applications

    No full text
    International Conference on Web Information Systems and Technologies, WEBIST (15th. 2019. Vienna, Austria
    corecore